|
In mathematics and computing universal hashing (in a randomized algorithm or data structure) refers to selecting a hash function at random from a family of hash functions with a certain mathematical property (see definition below). This guarantees a low number of collisions in expectation, even if the data is chosen by an adversary. Many universal families are known (for hashing integers, vectors, strings), and their evaluation is often very efficient. Universal hashing has numerous uses in computer science, for example in implementations of hash tables, randomized algorithms, and cryptography. == Introduction == Assume we want to map keys from some universe into bins (labelled ). The algorithm will have to handle some data set of keys, which is not known in advance. Usually, the goal of hashing is to obtain a low number of collisions (keys from that land in the same bin). A deterministic hash function cannot offer any guarantee in an adversarial setting if the size of is greater than , since the adversary may choose to be precisely the preimage of a bin. This means that all data keys land in the same bin, making hashing useless. Furthermore, a deterministic hash function does not allow for ''rehashing'': sometimes the input data turns out to be bad for the hash function (e.g. there are too many collisions), so one would like to change the hash function. The solution to these problems is to pick a function randomly from a family of hash functions. A family of functions is called a universal family if, . In other words, any two keys of the universe collide with probability at most when the hash function is drawn randomly from . This is exactly the probability of collision we would expect if the hash function assigned truly random hash codes to every key. Sometimes, the definition is relaxed to allow collision probability . This concept was introduced by Carter and Wegman〔 〕 in 1977, and has found numerous applications in computer science (see, for example 〔 〕). If we have an upper bound of on the collision probability, we say that we have -almost universality. Many, but not all, universal families have the following stronger uniform difference property: : , when is drawn randomly from the family , the difference is uniformly distributed in . Note that the definition of universality is only concerned with whether , which counts collisions. The uniform difference property is stronger. (Similarly, a universal family can be XOR universal if , the value is uniformly distributed in where is the bitwise exclusive or operation. This is only possible if is a power of two.) An even stronger condition is pairwise independence: we have this property when we have the probability that will hash to any pair of hash values is as if they were perfectly random: . Pairwise independence is sometimes called strong universality. Another property is uniformity. We say that a family is uniform if all hash values are equally likely: for any hash value . Universality does not imply uniformity. However, strong universality does imply uniformity. Given a family with the uniform distance property, one can produce a pairwise independent or strongly universal hash family by adding a uniformly distributed random constant with values in to the hash functions. (Similarly, if is a power of two, we can achieve pairwise independence from an XOR universal hash family by doing an exclusive or with a uniformly distributed random constant.) Since a shift by a constant is sometimes irrelevant in applications (e.g. hash tables), a careful distinction between the uniform distance property and pairwise independent is sometimes not made.〔 〕 For some applications (such as hash tables), it is important for the least significant bits of the hash values to be also universal. When a family is strongly universal, this is guaranteed: if is a strongly universal family with , then the family made of the functions fails to be universal. UMAC and Poly1305-AES and several other message authentication code algorithms are based on universal hashing.〔 David Wagner, ed. ("Advances in Cryptology - CRYPTO 2008" ). p. 145. 〕〔 Jean-Philippe Aumasson, Willi Meier, Raphael Phan, Luca Henzen. ("The Hash Function BLAKE" ). 2014. p. 10. 〕 In such applications, the software chooses a new hash function for every message, based on a unique nonce for that message. Several hash table implementations are based on universal hashing. In such applications, typically the software chooses a new hash function only after it notices that "too many" keys have collided; until then, the same hash function continues to be used over and over. (Some collision resolution schemes, such as dynamic perfect hashing, pick a new hash function every time there is a collision. Other collision resolution schemes, such as cuckoo hashing and 2-choice hashing, allow a number of collisions before picking a new hash function). 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Universal hashing」の詳細全文を読む スポンサード リンク
|